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* ABSTRACT 

Simultaneous multithreading is a technique that permits multiple independent threads to 
issue multiple instructions each cycle. In previous work we demonstrated the performance 
potential of simultaneous multithreading, based on a somewhat idealized model. In this 
paper we show that the throughput gains from simultaneous multithreading can be achieved 
without extensive changes to a conventional wide-issue superscalar, either in hardware 
structures or sizes. We present an architecture for simultaneous multithreading that achieves 
three goals: (1) it minimizes the architectural impact on the conventional superscalar design, 
(2) it has minimal performance impact on a single thread executing alone, and (3) it 
achieves significant throughput gains when running multiple threads. Our simultaneous 
multithreading architecture achieves- a throughput of 5.4 instructions per cycle, a 2.5-fold 
improvement over an unmodified superscalar with similar hardware resources. This speedup 
is enhanced by an advantage of multithreading previously unexploited in other architectures: 
the ability to favor for fetch and issue those threads most efficiently using the processor each 
cycle, thereby providing the "best" instructions to the processor. 
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